Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching - Automatic Control, IEEE Transactions on
نویسندگان
چکیده
We consider multiarmed bandit problems with switching cost, define uniformly good allocation rules, and restrict attention to such rules. We present a lower bound on the asymptotic performance of uniformly good allocation rules and construct an allocation scheme that achieves the bound. We discover that despite the inclusion of a switching cost the proposed allocation scheme achieves the same asymptotic performance as the optimal rule for the bandit problem without switching cost. This is made possible by grouping together the samples in a certain fashion. Finally, we illustrate an optimal allocation scheme for a large class of distributions which includes members of the exponential family.
منابع مشابه
Multi armed bandit problem: some insights
Multi Armed Bandit problems have been widely studied in the context of sequential analysis. The application areas include clinical trials, adaptive filtering, online advertising etc. The study is also characterized as a policy selection which maximizes a gambler’s reward when there are multiple slot machines that are generating them. It is under this framework, that we describe the model and de...
متن کاملFinite-Time Regret Bounds for the Multiarmed Bandit Problem
We show finite-time regret bounds for the multiarmed bandit problem under the assumption that all rewards come from a bounded and fixed range. Our regret bounds after any number T of pulls are of the form a+b logT+c log2 T , where a, b, and c are positive constants not depending on T . These bounds are shown to hold for variants of the popular "-greedy and Boltzmann allocation rules, and for a ...
متن کاملOn the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem
The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping probl...
متن کاملMATHEMATICAL ENGINEERING TECHNICAL REPORTS An Asymptotically Optimal Policy for Finite Support Models in the Multiarmed Bandit Problem
We propose minimum empirical divergence (MED) policy for the multiarmed bandit problem. We prove asymptotic optimality of the proposed policy for the case of finite support models. In our setting, Burnetas and Katehakis [3] has already proposed an asymptotically optimal policy. For choosing an arm our policy uses a criterion which is dual to the quantity used in [3]. Our criterion is easily com...
متن کاملAsymptotically Efficient Allocation Rules for the - Multiarmed Bandit Problem with Multiple Plays - Part 11 : Markovian Rewards
At each instant of lime we are required to sample a fixed number rn 2 1 out of N Markov chains whose stationary transition probability matrices belong to a family suitably parameterized by a real number 8. The objective is to maximize the long run expected value of the samples. The learning loss of a sampling scheme corresponding to a parameters configuration C = (el,. .. , e, %*) is quantified...
متن کامل